The Everlasting Database: Statistical Validity at a Fair Price
نویسندگان
چکیده
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for M non-adaptive queries is O(logM), while the cost to a potentially adaptive user who makes M queries that do not depend on any others is O( √
منابع مشابه
Estimation of quantitative characteristics considering CPI microdata in Iran
The aim of this study is to estimate the known statistical characteristics of nominal price stickiness in the Iranian economy during the years 1390 to 1399 and at different commodity levels of microdata of consumer price index (including product category, Coicop commodity group, and the whole economy) and thus the stickiness between categories and product groups are also compared. For this purp...
متن کاملAnalytical Review of Fair Distribution of Recreational and Sport Services in by Using Topsis Model
Background. Fair distribution of sports facilities is very effective in the tendency of citizens to exercise. Therefore, the distribution of sports and recreational facilities in cities should be carefully and scientifically explored. Objectives. The purpose of this study was analytical review of fair distribution of recreational and sport services in the city of Mashhad by using Topsis model....
متن کاملRelating Fairness and Timing in Process Algebras
This paper contrasts two important features of parallel system computations: fairness and timing. The study is carried out at specification system level by resorting to a well-known process description language. The language is extended with labels which allow to filter out those process executions that are not (weakly) fair (as in [5,6]), and with upper time bounds for the process activities (...
متن کاملAnalyzing the Factors Affecting on Price Premium to Ecotourism (Case Study: Isfahan Mesr Desert)
Desert tourism is part of the tourism industry; trip and hike in desert and wasteland areas created a specific type of tourism that is called Desert tourism. Given that recognition, the factors that effect on Willingness to pay Price Premium to ecotourism can lead tourism destination to success. therefore, the intention of this research is to identify and ranking factors that affect paying Pric...
متن کاملStock Market Fraud Detection, A Probabilistic Approach
In order to have a fair market condition, it is crucial that regulators continuously monitor the stock market for possible fraud and market manipulation. There are many types of fraudulent activities defined in this context. In our paper we will be focusing on "front running". According to Association of Certified Fraud Examiners, front running is a form of insider information and thus is very ...
متن کامل